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Abstract. We continue our analysis of volume and energy measures 
that are appropriate for quantifying inductive inference systems. We ex¬ 
tend logical depth and conceptual jump size measures in AIT to stochas¬ 
tic problems, and physical measures that involve volume and energy. We 
introduce a graphical model of computational complexity that we believe 
to be appropriate for intelligent machines. We show several asymptotic 
relations between energy, logical depth and volume of computation for 
inductive inference. In particular, we arrive at a “black-hole equation” of 
inductive inference, which relates energy, volume, space, and algorithmic 
information for an optimal inductive inference solution. We introduce 
energy-bounded algorithmic entropy. We briefly apply our ideas to the 
physical limits of intelligent computation in our universe. 


“Everything must be made as simple as possible. But not simpler.” 

— Albert Einstein 


1 Introduction 

We initiated the ultimate intelligence research program in 2014 inspired by 
Seth Lloyd’s similarly titled article on the ultimate physical limits to computa¬ 
tion [6], intended as a book-length treatment of the theory of general-purpose 
AI. In similar spirit to Lloyd’s research, we investigate the ultimate physical lim¬ 
its and conditions of intelligence. A main motivation is to extend the theory of 
intelligence using physical units, emphasizing the physicalism inherent in com¬ 
puter science. This is the second installation of the paper series, the first part m 
proposed that universal induction theory is physically complete arguing that the 
algorithmic entropy of a physical stochastic source is always finite, and argued 
that if we choose the laws of physics as the reference machine, the loophole in 
algorithmic information theory (AIT) of choosing a reference machine is closed. 
We also introduced several new physically meaningful complexity measures ade¬ 
quate for reasoning about intelligent machinery using the concepts of minimum 
volume, energy and action, which are applicable to both classical and quantum 
computers. Probably the most important of the new measures was the mini¬ 
mum energy required to physically transmit a message. The minimum energy 
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complexity also naturally leads to an energy prior, complementing the speed 
prior m which inspired our work on incorporating physical resource limits to 
inductive inference theory. 

In this part, we generalize logical depth and conceptual jump size to stochas¬ 
tic sources and consider the influence of volume, space and energy. We consider 
the energy efficiency of computing as an important parameter for an intelligent 
system, forgoing other details of a universal induction approximation. We thus 
relate the ultimate limits of intelligence to physical limits of computation. 


2 Notation and Background 

Let us recall Solomonoff’s universal distribution m- Let [/ be a universal 
computer which runs programs with a prefix-free encoding like LISP; y = U{x) 
denotes that the output of program x on U is y where x and y are bit strings. [1| 
Any unspecified variable or function is assumed to be represented as a bit string. 
\x\ denotes the length of a bit-string x. /(■) refers to function / rather than its 
application. 

The algorithmic probability that a bit string x S {0,1}+ is generated by a 
random program tt G {0,1}“'" of U is: 

Pu{x) = Y. 2-1-1 (1) 

f/(7r)£a:(0-|-l)* A'n-G{0,l}+ 

which conforms to Kolmogorov’s axioms Pu{x^ considers any continuation of 
X, taking into account non-terminating programso Pjj is also called the universal 
prior for it may be used as the prior in Bayesian inference, for any data can be 
encoded as a bit string. 

We also give the basic definition of Algorithmic Information Theory (AIT), 
where the algorithmic entropy, or complexity of a bit string x G {0,1}’*' is 

7L(7(a;) = min({|7r| I C/(7r) = x}) (2) 


We shall now briefly recall the well-known Solomonoff induction method 
[T71T8] . Universal sequence induction method of Solomonoff works on bit strings 
X drawn from a stochastic source y. Equation [T] is a semi-measure, but that is 
easily overcome as we can normalize it. We merely normalize sequence probabil¬ 
ities 


PiixO) 


Pu{xO).PIj{x) 
Pu{x0) + Pu{xl) 


Piixl) 


Pu{x1).PIj{x) 
Pu{x0) + Puixl) 


(3) 


eliminating irrelevant programs and ensuring that the probabilities sum to 1, 
from which point on P^(x0|x) = Plj{x0) /P[j{x) yields an accurate prediction. 

^ A prefix-free code is a set of codes in which no code is a prefix of another. A com¬ 
puter file uses a prefix-free code, ending with an EOF symbol, thus, most reasonable 
programming languages are prefix-free. 

^ We used the regular expression notation in language theory. 
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The error bound for this method is the best known for any such induction 
method. The total expected squared error between P{j{x) and /x is 


Ep 


n 

{Pu{am+1 = l|aia2...ar„) - n{am+i = l|aia2...a™))^ 

_m—l 


< -^lnPu{^i) 


(4) 

which is less than — l/21nP^(/i) according to the convergence theorem proven 
in m, and it is roughly P[u{^) In 2 [^. Naturally, this method can only work 
if the algorithmic complexity of the stochastic source is finite, i.e., the 

source has a computable probability distribution. The convergence theorem is 
quite significant, because it shows that Solomonoff induction has the best gen¬ 
eralization performance among all prediction methods. In particular, the total 
error is expected to be a constant independent of the input, and the error rate 
will thus rapidly decrease with increasing input size. 

Operator induction is a general form of supervised machine learning where 
we learn a stochastic map from question and answer pairs qt , sampled from a 
(computable) stochastic source fx. Operator induction can be solved by finding in 
available time a set of operator models 0'^ (-|-) such that the following goodness 
of fit is maximized 




(5) 
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for a stochastic source ^ where each term in the summation is 


pO'J(a,|g0. (6) 

i=l 


Qi and Qi are question/answer pairs in the input dataset, and is a computable 
conditional pdf (cpdf) in EquationjBl We can use the found operators to predict 
unseen data m 


n 

Pu {O'n+llqn+l) — (a„+i lo'n-l-l) (7) 

i=i 

The goodness of fit in this case strikes a balance between high a priori probability 
and reproduction of data like in minimum message length (MML) method, yet 
uses a universal mixture like in sequence induction. The convergence theorem 
for operator induction was proven in |21] using flutter’s extension to arbitrary 
alphabet. 

Operator induction infers a generalized conditional probability density func¬ 
tion (cpdf), and Solomonoff argues that it can be used to teach a computer 
anything. For instance, we can train the question/answer system with physics 
questions and answers, and the system would then be able to answer a new 
physics question, dependent upon how much has been taught in the examples; a 
future user could ask the system to describe a physics theory that unifies quan¬ 
tum mechanics and general relativity, given the solutions of every mathematics 
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and physics problem ever solved in literature. Solomonoff’s original training se¬ 
quence plan proposed to instruct the system first with an English subset and 
basic algebra, and then venture into more complex subjects. The generality of 
operator induction is partly due to the fact that it can be used to learn any 
kind of association, i.e., it models an ideal content-addressable memory, but it 
also generalizes any kind of law therein implicitly, that is why it can learn an 
implicit principle (such as of syntax) from linguistic input, enabling the system 
to acquire language; it can also model complex translation problems, and all 
manners of problems that require additional reasoning (computation). In other 
words, it is a universal problem solver model. It is also the most general of the 
three kinds of induction, which are sequence, set, and operator induction, and 
the closest to machine learning literature. The popular applications of speech 
and image recognition are covered by operator induction model, as is the wealth 
of pattern recognition applications, such as describing a scene in English. We 
think that, therefore, operator induction is an Al-complete problem - as hard as 
solving the human-level AI problem in general. It is with this in mind that we 
analyze the asymptotic behavior of an optimal solution to operator induction 
problem. 

3 Physical Limits to Universal Induction 

In this section, we elucidate the physical resource limits in the context of a 
hypothetical optimal solution to operator induction. We first extend Bennett’s 
logical depth and conceptual jump size to the case of operator induction, and 
show a new relation between expected simulation time of the universal mixture 
and conceptual jump size. We then introduce a new graphical model of compu¬ 
tational complexity which we use to derive the relations among physical resource 
bounds. We introduce a new definition of physical computation which we call 
self-contained computation, which is a physical counterpart to self-delimiting 
program. The discovery of these basic bounds, and relations, exact, and asymp¬ 
totic, give meaning to the complexity definitions of Part I. 

Please note that Schmidhuber disagrees with the model of the stochastic 
source as a computable pdf [15], but Part I contained a strong argument that 
this was indeed the case. A stochastic source cannot have a pdf that is com¬ 
putable only in the limit, if that were the case, it could have a random pdf, 
which would have infinite algorithmic information content, and that is clearly 
contradicted by the main conclusion of Part L A stochastic source cannot be 
semi-computable, because it would eventually run out of energy and hence the 
ability to generate further quantum entropy, especially the self-contained com¬ 
putations of this section. That is the reason we had introduced self-contained 
computation notion at any rate. Note also that Schmidhuber agrees that quan¬ 
tum entropy does not accumulate to make the world incompressible in general, 
therefore we consider his proposal that we should view a cpdf as computable in 
the limit as too weak an assumption. As with Part I, the analysis of this section 
is extensible to quantum computers, which is beyond the scope of the present 
article. 
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3.1 Logical depth and conceptual jump size 

Conceptual Jump Size (CJS) is the time required by an incremental induc¬ 
tive inference system to learn a new concept, and it increases exponentially in 
proportion to the algorithmic information content of the concept to be learned 
relative to the concepts already known [^. The physical limits to OOPS based 
on Conceptual Jump Size were examined in M- Here, we give a more detailed 
treatment. Let tt* be the computable cpdf that exactly simulates ji with respect 
to U, for operator induction. 

TT* = argmin({|7rj| \ Vx,y G {0,1}* : U{Trj,x,y) = fi{x\y)}) (8) 

The conceptual jump size of inductive inference (CJS) can be 
spect to the optimal solution program using Levin search |16) : 

CJSiy) = < 2.CJS(/r) 

where t(-) is the running time of a program on U. 

HuiTT*) = - log 2 Pc/(7r*) = - log 2 Puip) (10) 

t(^) < t(7r*)2^^('^)+i (11) 

where t(/i) is the time for solving an induction problem from source y with suffi¬ 
cient input complexity (>> Hu{p)), we observe that the asymptotic complexity 
is 

= ( 12 ) 

for fixed t{'K*). Note that t{'K*) corresponds to the stochastic extension of Ben¬ 
nett’s logical depth [T], which was defined as: “the running time of the minimal 
program that computes x”. Let us recall that the minimal program is essentially 
unique, a polytope in program space [5]. 

Definition 1. Stochastic logical depth is the running time of the minimal pro¬ 
gram that accurately simulates a stochastic source fi. 

Lu{p)=t{n*) (13) 

which, with Equation I 111 entails our first bound. 

Lemma 1. 

t{pL) < (14) 

Lemma 2. CJS is related to the expectation of the simulation time of the uni¬ 

versal mixture. 

CJS{p) < ^ t(^).2-l"l = Ep^[{t(Tr) I U{t:) G cr(0 + 1)*}] (15) 

C/(7r)ea:(0+l)* 

where x is the input data to sequence induction, without loss of generality. 

Proof. Rewrite as t(7r*)2l“’"'*l < X]; 7 ( 7 i-)gk(o+i)* ■ Observe that left- 

hand side of the inequality is merely a term in the summation in the right. 


defined with re- 

(9) 
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3.2 A Graphical Analysis of Intelligent Computation 

Let us introduce a graphical model of computational complexity that will help 
us visualize physical complexity relations that will be investigated. We do not 
model the computation itself, we just enumerate the physical resources required. 
Present treatment is merely classical computation over sequential circuits. 

Definition 2. Let the computation be represented by a directed bi-partite graph 
G = (y, E) where vertices are partitioned into Vo and Vm which correspond to 
primitive operations and memory eells respectively, V = VnU Vm, Vq H Vm = 0. 
Function t : VUE —)• Z assigns time to vertices and edges.^ Edges correspond to 
causal dependencies. I C V and O C V correspond to input and output vertices 
interacting with the rest of the world. We denote acccess to vertex subsets with 
functions over G, e.g., I{G). 

Definition [2] is a low-level computational complexity model where the physical 
resources consumed by any operation, memory cell, and edge are the same for 
the sake of simplicity. Let Vu be the unit space-time volume, e„ be the unit 
energy, and Su be the unit space. 

Definition 3. Let the volume of computation be defined as Vu{t^) which mea¬ 
sures the space-time volume of computation of tt on U in physical units, i.e., 
.sec. 

For Definition [21 it is (|y(G)| -I- \E{G)\).Vu. Volume of computation measures 
the extent of the space-time region occupied by the dynamical evolution of the 
computation of tt on U. We do not consider the theory of relativity. For instance, 
the space of a Turing Machine is the Instantaneous Description (ID) of it, and its 
time corresponds to . A Turing Machine derivation that has an ID of length 
i at time i and takes t steps to complete would have a volume of t.(t -|- l)/2|f| 

Definition 4. Let the energy of computation be defined as Eu(tt) which mea¬ 
sures the total energy required by computation of tt on U in physical units, e.g, 

J. 

For Definition 121 it is Eu{tt) = (|y(G)| -I- \E{G)\).eu. 

Definition 5. Let the space of computation be defined as SuiTr) which measures 
the maximum volume of a synchronous slice of the space-time of computation tt 
on U in physical units, e.g., mf’. 

For Definition 121 it is 

max{|{a; G {^(G) U i?(G)}| t{x) = i}|}.Su (16) 


Definition 6. In a self-contained physical computation all the physical resources 
required by computation should be contained within the volume of computation. 

® Time as discrete timestamps, as opposed to duration. 

If the derivation is A ^ AA —>■ AAA, it has 1 -I- 2 -|- 3 = 6 volume. 
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Therefore, we do not allow a self-contained physical computation to send queries 
over the internet, or use a power cord, for instance. 

Using these new more general concepts, we measure the conceptual jump 
size in space-time volume rather than time (space-time extent might be a more 
accurate term). Algorithmic complexity remains the same, as the length of a pro¬ 
gram readily generalizes to space-time volume of program at the input boundary 
of computation, which would be Vb(G') = |/(G) Cl Vm{G)\.Vu for Definition [21 
If 2 / = U{x), bitstring x and y correspond to /(G), and 0(G) respectively. A 
program tt corresponds to a vertex set 14 4 /(G) usually, and its size is denoted 
as Vb(7r). We use bitstrings for data and programs below, but measure their 
sizes in physical units using this notation. It is possible to eliminate bit strings 
altogether using a volume prior, we mix notations only for ease of understanding. 
Let us generalize logical depth to the logical volume of a bit string x: 

lYj{x) = Vi7(argmin{Uo(7r) | G(7r) G x(0 + I)*}) (17) 

TT 

Let us also generalize stochastic logical depth to stochastic logical volume: 

LM = Vu(7r*) (18) 

which entails that Conceptual Jump Volume (CJV), and logical volume Vb of a 
stochastic source may be defined analogously to CJS 

CJV(m) = < Vcf(p) < 2.CJV(/i) (19) 

where left-hand side corresponds to space-time extent variant of CJS. Likewise, 
we define logical energy for a bit string, and stochastic logical energy: 

L§(x) ^ /;c/(argmin{Uo(^) | U(7r) G cr(0 + 1)*}) L§(fi) ^ Eu{^*) (20) 

TT 

Which brings us to an energy based statement of conceptual jump size, that we 
term conceptual jump energy, or conceptual gap energy: 

Lemma 3. CJE(/r) = Eu{Tr*). 2 ^^G) < EuifJ-) < 2.CJE{pL). 

The inequality holds since we can use Eu(-) bounds in universal search instead 
of time. We now show an interesting relation which is the case for self-contained 
computations. 

Lemma 4. If all basic operations and basic communications spend constant en¬ 
ergy for a fixed space-time extent (volume), then: 

Eu{n*) = 0{Vu{n*)) Eu{p) = 0{Ll{p)). 

One must spend energy to conserve a memory state, or to perform a basic 
operation (in a classical computer). We may assume the constant complexity 
of primitive operations, which holds in Definition |2J Let us also assume that the 
space complexity of a program is proportional to how much mass is required. 
Then, the energy from the resting mass of an optimal computation may be taken 
into account, which we call total energy complexity (in metric units): 
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Lemma 5. 

Et{TT*) = deVuin*) + Sui7r*)d^c^ 

ddtifJ-) = deL^[fj,) + Su{fi)dmC^ = 0{L^(fj,) + Su{fi)) 

where c is the speed of light, energy density de = e„/■(;„, and mass density 
dm = rriulsu for the graphical model of complexity. 

Lemma 6. Conceptual jump total energy (CJTE) of a stochastic source is: 

CJTE(^) 4 EtiTT*).2^^^i^'> < Etifi) < 2 .CJTE{pl). (21) 

As a straightforward consequence of the above lemmas, we show a lower 
bound on the energy required, that is related to the volume, and space linearly, 
and algorithmic complexity of a stochastic source exponentially, for optimal in¬ 
duction. 

Theorem 1. CJTE(/r) = {deL^ip) + Su{p)dmC^) < Et{p) < 2.CJTE{p) 

Proof. We assume that the energy density is constant; we can use Et{-) for 
resource bounds in Levin search. The inequality is obtained by substituting 
Lemma [5] into the definitional inequality. 

The last inequality gives bounds for the total energy cost of inferring a source 
p in relation to space-time extent (volume of computation), space complexity, 
and an exponent of algorithmic complexity of p. This inspires us to define priors 
using CJV, CJE, and CJTE which would extend Levin’s ideas about resource 
bounded Kolmogorov complexity, such as Kt complexity. In the first installation 
of ultimate intelligence series, we had introduced complexity measures and priors 
based on energy and action. We now define the one that corresponds to CJE 
and leave the rest as future work due to lack of space. 

Definition 7. Energy-bounded algorithmic entropy of a bit string is defined as: 

iJe(a;) = min{|7r|-I-log 2 E(7(7r) | U{Tr)=x} (22) 

3.3 Physical limits, incremental learning, and digital physics 

Landauer’s limit is a thermodynamic lower bound of kTln2 J for erasing 1 
bit where k is the Boltzmann constant and T is the temperature [1]. The total 
number of bit-wise operations that a quantum computer can evolve is 2E/h 
operations where E is average energy, and thus the physical limit to energy 
efficiency of computation is about 3.32 x 10^^ operations/J [5]. Note that the 
Margolus-Levitin limit may be considered a quantum analogue of our relation 
of the volume of computation with total energy, which is called E.t “action 
volume” in their paper, as it depends on the quantum of action h which has E.t 
units. Bremermann discusses the minimum energy requirements of computation 
and communication in [5] . Lloyd [B] assumes that all the mass may be converted 
to energy and calculates the maximum computation capacity of a 1 kilogram 
“black-hole computer”, performing 10®^ operations over 10^^ bits. According to 
an earlier paper of his, the whole universe may not have performed more than 
10^^° operations over 10®° bits [7]. 
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Corollary 1. H{fi) < 397.6 for any where the logical volume is 1. 

Proof. V{p) < Assume that L^(/x) = l.@log2(2^'^(^^+^) 

3.321 X 120. Hi^i) + 1 < 398.6 

Therefore, if /i has a greater algorithmic complexity than about 400 bits, it 
would have been unguaranteed to discover it without any a priori information. 
Digital physics theories suggest that the physical law could be much simpler than 
that however, as there are very simple universal computers in the literature [S], 
a survey of which may be found in HU], which means interestingly that the 
universe may have had enough time to discover its basic law. 

This limit shows the remarkable importance of incremental learning as both 
Solomonoff [53] and Schmidhuber m have emphasized, which is part of ongoing 
research. We proposed previously that incremental learning is an AI axiom Ha- 
Optimizing energy efficiency of computation would also be an obviously useful 
goal for a self-improving AI. This measure was first formalized by Solomonoff in 
m, which he imagined would be optimizing performance in units of bits/sec. J 
as applied to inductive inference, which we agree with, and will eventually imple¬ 
ment in our Alpha Phase 2 machine; Alpha Phase 1 has already been partially 
implemented in our parallel incremental inductive inference system m- 
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